LLMs
Assets
1804.07461v3
PDF
1905.00537v3
PDF
AI_RMF_Playbook
PDF
Artificial Intelligence Risk Management Framework_ Generative Artificial Intelligence Profile - NIST.AI.600-1
PDF
cb6d9eca-en
PDF
Framework for Information Literacy for Higher Education - Framework_ILHE
PDF
NIST.AI.100-1
PDF
Towards a Standard for Identifying and Managing Bias in Artificial Intelligence - NIST.SP.1270
PDF
Authors
Alec Radford
Alethea Power
Alex Nichol
Alex Paino
Alex Ray
Alex Wang
Ali Farhadi
Amanpreet Singh
Anastasios Nikolas Angelopoulos
Andrew N. Carr
Andy Zou
Ari Holtzman
Ariel Herbert-Voss
Ashish Sabharwal
Augustus Odena
Banghua Zhu
Bob McGrew
Brooke Chan
Carissa Schoenick
Carrie Cai
Christopher Hesse
Christopher Manning
Clemens Winter
Collin Burns
Dacheng Li
Dan Hendrycks
Dario Amodei
Dave Cummings
David Dohan
Dawn Song
Dzmitry Bahdanau
Elizabeth Barnes
Ellen Jiang
Eric P. Xing
Evan Morikawa
Felipe Petroski Such
Felix Hill
Fotios Chantzis
Girish Sastry
Greg Brockman
Gretchen Krueger
Hao Zhang
Harm de Vries
Harri Edwards
Heewoo Jun
Heidy Khlaaf
Henrique Ponde de Oliveira Pinto
Henryk Michalewski
Igor Babuschkin
Ilya Sutskever
Ion Stoica
Isaac Cowhey
Jacob Austin
Jacob Hilton
Jacob Steinhardt
Jan Leike
Jared Kaplan
Jerry Tworek
Jie Tang
John Schulman
Joseph E. Gonzalez
Josh Achiam
Julian Michael
Karl Cobbe
Katie Mayer
Lianmin Zheng
Lukasz Kaiser
Maarten Bosma
Mantas Mazeika
Mark Chen
Matthew Knight
Matthias Plappert
Maxwell Nye
Michael Jordan
Michael Petrov
Michael Terry
Mikhail Pavlov
Miles Brundage
Mira Murati
Mohammad Bavarian
Nicholas Joseph
Nick Ryder
Nikita Nangia
Nikolas Tezak
Omer Levy
Oren Etzioni
Owain Evans
Oyvind Tafjord
Pamela Mishkin
Peter Clark
Peter Welinder
Philippe Tillet
Qiming Yuan
Quoc Le, Charles Sutton
Raul Puri
Reiichiro Nakano
Rowan Zellers
Sam McCandlish
Samuel R. Bowman
Scott Gray
Shantanu Jain
Siyuan Zhuang
Stephanie Lin
Steven Basart
Suchir Balaji
Tianle Li
Tushar Khot
Vedant Misra
Vineet Kosaraju
Wei-Lin Chiang
William Hebgen Guss
William Saunders
Wojciech Zaremba
Yada Pruksachatkun
Yejin Choi
Ying Sheng
Yonatan Bisk
Yonghao Zhuang
Yuri Burda
Zhanghao Wu
Zhuohan Li
Zi Lin
Benchmarks
AI2 Reasoning Challenge (ARC)
Chatbot Arena
GLUE (General Language Understanding Evaluation)
GSM8K (Grade School Math 8K)
HellaSwag (Harder Endings, Longer contexts and Low-shot Activities for Situations With Adversarial Generations)
HumanEval
MBPP (Mostly Basic Programming Problems)
MMLU (Massive Multitask Language Understanding)
MT Bench
SuperGLUE
TruthfulQA
Winogrande
Concepts
Chain of Thought
Ecological Validity
Imitative Falsehoods
Natural Language Inference
Natural Language Understanding
Textual Entailment
Organizations
DeepMind
Facebook AI Research
New York University
NIST
University of Washington
Papers
Chatbot Arena- An Open Platform for Evaluating LLMs by Human Preference
Evaluating Large Language Models Trained on Code
GLUE- A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
HellaSwag- Can a Machine Really Finish Your Sentence?
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Measuring Massive Multitask Language Understanding
Program Synthesis with Large Language Models
SuperGLUE- A Stickier Benchmark for General-Purpose Language Understanding Systems
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Towards Ecologically Valid Research on Language User Interfaces
Training Verifiers to Solve Math Word Problems
TruthfulQA- Measuring How Models Mimic Human Falsehoods
WinoGrande- An Adversarial Winograd Schema Challenge at Scale
Publications
Framework for Information Literacy for Higher Education
NIST AI 100-1
NIST AI 600-1
NIST AI RMF Playbook
NIST SP 1270
OECD Framework for the Classification of AI systems
Templates
Academic Paper
Webpages
AI - Overview of common LLM Benchmarks
AI Fairness 360
An In-depth Guide to Benchmarking LLMs
An introduction to code LLM benchmarks for software engineers
Benchmarking AI
DataComp
DataPerf
Errors in the MMLU- The Deep Learning Benchmark is Wrong Surprisingly Often
Tensorflow - Fairness Indicators
Annotated Bibliography
Checklist Notes
index
Reading List
Taxonimy of LLM tasks maybe?
OECD Framework for the Classification of AI systems
Table Of Contents
OECD Framework for the Classification of AI systems